In the context of machine learning, disparate impact refers to a form ofsystematic discrimination whereby the output distribution of a model depends onthe value of a sensitive attribute (e.g., race or gender). In this paper, wepresent an information-theoretic framework to analyze the disparate impact of abinary classification model. We view the model as a fixed channel, and quantifydisparate impact as the divergence in output distributions over two groups. Wethen aim to find a \textit{correction function} that can be used to perturb theinput distributions of each group in order to align their output distributions.We present an optimization problem that can be solved to obtain a correctionfunction that will make the output distributions statisticallyindistinguishable. We derive closed-form expression for the correction functionthat can be used to compute it efficiently. We illustrate the use of thecorrection function for a recidivism prediction application derived from theProPublica COMPAS dataset.
展开▼